Pemodelan Topik Menggunakan n-Gram dan Non-negative Matrix Factorization

نویسندگان

چکیده

Pemodelan topik merupakan teknik pembelajaran mesin yang digunakan untuk melihat dalam sekumpulan dokumen teks. Pada penelitian ini pemodelan adalah Non-Negative Matrix Factorization (NMF) dengan n-gram. Preprocessing seperti penghilangan tanda baca, angka dan stopword diimplementasikan pada ini. Proses dilakukan terlebih dahulu mengubah kata terdapat artikel menjadi berhuruf kecil. Penelitian juga mengeksplorasi keefektifan penerapan unigram, bigram, trigram topik. menggunakan coherence value menentukan jumlah terbaik dapat dibentuk. Data berjumlah 53.920 berita bersumber dari portal RMOL.id BeritaSatu.com periode Juli sampai Desember 2022. Visualisasi t-SNE distribusi pembentukan Berdasarkan hasil diperoleh bahwa dibentuk unigram 15 nilai 0.812748, bigram 10 0.835738 7 0.830572. Sedangkan 0.799718, 0.788762 0.801935.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bayesian Non-negative Matrix Factorization

We present a Bayesian treatment of non-negative matrix factorization (NMF), based on a normal likelihood and exponential priors, and derive an efficient Gibbs sampler to approximate the posterior density of the NMF factors. On a chemical brain imaging data set, we show that this improves interpretability by providing uncertainty estimates. We discuss how the Gibbs sampler can be used for model ...

متن کامل

Robust non-negative matrix factorization

Non-negative matrix factorization (NMF) is a recently popularized technique for learning partsbased, linear representations of non-negative data. The traditional NMF is optimized under the Gaussian noise or Poisson noise assumption, and hence not suitable if the data are grossly corrupted. To improve the robustness of NMF, a novel algorithm named robust nonnegative matrix factorization (RNMF) i...

متن کامل

Non-Negative Multiple Matrix Factorization

Non-negative Matrix Factorization (NMF) is a traditional unsupervised machine learning technique for decomposing a matrix into a set of bases and coefficients under the non-negative constraint. NMF with sparse constraints is also known for extracting reasonable components from noisy data. However, NMF tends to give undesired results in the case of highly sparse data, because the information inc...

متن کامل

Pruning sparse non-negative matrix n-gram language models

In this paper we present a pruning algorithm and experimental results for our recently proposed Sparse Non-negative Matrix (SNM) family of language models (LMs). We show that when trained with only n-gram features SNMLM pruning based on a mutual information criterion yields the best known pruned model on the One Billion Word Language Model Benchmark, reducing perplexity with 18% and 57% over Ka...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Jurnal Informasi dan Teknologi

سال: 2023

ISSN: ['2714-9730']

DOI: https://doi.org/10.60083/jidt.v5i1.385